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1.  Introduction 


Distributed  Network  Intrusion  Detection  Systems  (NIDS)  allow  a  relatively  small 
number  of  highly  trained  analysts  to  monitor  a  much  larger  number  of  sites;  how¬ 
ever,  they  require  information  to  be  transmitted  from  the  remote  sensor  to  the  central 
analysis  system  (CAS).  Unless  an  expensive  dedicated  NIDS  network  is  employed, 
this  transmission  must  use  the  same  channels  that  the  site  uses  to  conduct  daily 
business.  This  makes  it  important  to  reduce  the  amount  of  information  transmitted 
back  to  the  CAS  to  minimize  the  impact  that  the  NIDS  has  on  daily  operations. 

One  possible  solution  is  to  do  the  processing  on  the  sensor  and  transmit  only  alerts. 
This  solution  has  3  serious  problems.  The  first  is  that  a  sensor  with  a  central  pro¬ 
cessing  unit  (CPU)  overburdened  with  running  analysis  algorithms  will  not  be  able 
to  capture  all  of  the  packets  that  traverse  the  network.  Smith  et  al.  discussed  the 
impact  of  this  packet  loss.12  The  second  is  that  the  alerts  alone  seldom  provide 
the  analyst  with  the  information  necessary  to  determine  if  the  attack  was  success¬ 
ful.  The  third  is  that  signature  based  NIDS  are  ineffective  at  detecting  zero-day  or 
advanced  persistent  threat  attacks. 

Another  possible  solution  is  to  do  very  little  processing  on  the  sensor  and  transmit 
all  of  the  information  captured  to  the  CAS.  This  frees  the  sensor  CPU  and  provides 
the  analyst  with  the  required  information  but  doubles  the  traffic  on  the  network  since 
every  packet  captured  must  be  transmitted.  Lossless  compression  is  a  possible  solu¬ 
tion;  however,  this  introduces  additional  latency,  and  the  best  lossless  compression 
algorithms  are  still  not  able  to  reduce  the  impact  to  daily  operations  enough. 

One  reasonable  alternative  is  to  use  lossy  compression  as  a  solution.  This  solution 
introduces  its  own  set  of  problems,  as  it  requires  a  sound  method  to  determine 
the  likelihood  that  traffic  is  malicious  and  the  ability  to  take  that  likelihood  and 
determine  what  data  must  be  transmitted  and  what  data  are  safest  to  lose.  The  focus 
of  the  proposed  work  is  to  solve  these  problems. 

In  2004,  Kerry  Long  described  the  Interrogator  Intrusion  Detection  System  Archi¬ 
tecture.3  In  this  architecture,  remotely  deployed  sensors,  known  as  Gators,  collect 
network  traffic  and  transmit  a  subset  of  the  traffic  to  the  analysis  level.3  Interroga¬ 
tor  employs  “a  dynamic  network  traffic  selection  algorithm  called  Snapper.”3  The 
proposed  effort  will  build  on  the  work  done  with  Interrogator  to  add  an  intelligent 
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lossy  compression  algorithm  to  the  Snapper  functionality. 


This  intelligent  lossy  compression  algorithm  will  employ  expert  knowledge,  data 
mining,  and  best  of  breed  anomaly  detection  algorithms  to  assign  a  maliciousness 
score  to  each  session.  It  will  take  this  maliciousness  score  and  feed  it  into  a  Kelly 
criterion4-inspired  selection  formula  to  determine  how  much  traffic  from  each  ses¬ 
sion  to  transmit  to  the  CAS. 

The  remainder  of  this  proposal  is  organized  into  the  following  sections.  Section  2 
will  provide  some  background.  Section  3  provides  a  preliminary  literature  review. 
Section  4  will  present  a  complete  definition  of  the  problem.  Section  5  will  enumer¬ 
ate  the  objectives  of  this  research.  Section  6  will  enumerate  our  research  questions. 
Section  7  will  outline  the  approach  chosen  to  address  this  problem.  Section  9  will 
explain  the  results  that  we  expect  and  how  we  will  analyze  them  to  determine  the 
effectiveness  of  the  algorithm.  Finally,  Section  10  will  provide  a  brief  summary  of 
the  proposal. 

2.  Background 

To  implement  NIDS,  we  must  have  some  way  to  bring  the  relevant  traffic  back  to 
the  CAS.  One  popular  strategy  for  implementing  a  distributed  NIDS  is  to  do  all  of 
the  intrusion  detection  on  the  sensor  and  send  only  alerts  to  the  CAS.5,6  A  second 
strategy  might  be  to  use  lossless  compression  to  reduce  the  size  of  the  data  returned 
to  the  CAS.  A  third  strategy  is  to  implement  some  form  of  lossy  compression  algo¬ 
rithm  to  send  back  relevant  portions  of  traffic. 

There  are  3  problems  with  sending  only  alerts  to  the  CAS.  The  first  is  that  it  has  the 
potential  to  overburden  the  sensor’s  CPU  and  introduce  packet  loss.  The  impact  of 
this  packet  loss  has  been  discussed  by  Smith  et  al.1,2  The  second  problem  is  that  the 
alerts  by  themselves  often  do  not  contain  enough  information  to  determine  whether 
the  attack  was  successful.  The  third  problem  is  that  these  systems  are  most  of¬ 
ten  implemented  with  signature-based  intrusion  detection  engines.  Signature -based 
systems  may  be  tuned  to  produce  few  false  positives;  however,  they  are  ineffective 
at  detecting  zero-day  and  advanced  persistent  threats.7 

The  second  alternative  presented  several  algorithms  for  lossless  compress;  how¬ 
ever,  one  of  the  most  widely  used  is  deflation,  which  is  a  variation  of  the  LZ77 


Approved  for  public  release;  distribution  is  unlimited. 


2 


algorithm  described  by  Ziv  and  Lempel.8  Compressing  the  2009  Cyber  Defense 
Exercise  dataset9  with  gnuzip  provides  a  ratio  of  56.4%.  To  minimize  the  impact  of 
NIDS  on  day-to-day  operations,  compression  ratios  of  less  than  10%  are  required. 
Lossless  compression  alone  will  not  provide  a  reasonable  solution. 

The  concluding  reasonable  alternative  is  to  use  some  sort  of  lossy  compression 
strategy  to  provide  a  solution.  We  may  consider  network  traffic  to  be  composed 
of  sessions  that  span  spectrums  from  known  to  unknown  and  malicious  to  benign, 
as  illustrated  in  Fig.  1.  Quadrant  III,  the  known  malicious  quadrant,  is  the  domain 
of  intrusion  prevention  systems  as  described  by  Ierace  et  al.10  Specifically,  the  far 
lower-left  corner  of  quadrant  III,  or  the  intrusion  prevention  systems,  may  inflict  a 
denial  of  service  attack  upon  the  systems  that  they  are  protecting.  We  are  most  inter¬ 
ested  in  quadrant  II,  the  unknown  malicious  quadrant,  because  that  is  the  quadrant 
where  we  will  find  evidence  of  zero-day  and  advanced  persistent  threat  attacks.  We 
assume  that  malicious  traffic  makes  up  a  small  amount  of  the  actual  traffic  on  the 
network.  In  2004,  Kerry  Long  described  the  Interrogator  Intrusion  Detection  Sys¬ 
tem  Architecture.3  In  this  architecture,  remotely  deployed  sensors,  known  as  Gators, 
collect  network  traffic  and  transmit  a  subset  of  the  traffic  to  the  analysis  level.  Inter¬ 
rogator  employs  “a  dynamic  network  traffic  selection  algorithm  called  Snapper.”3 
Long  and  Morgan  describe  how  they  used  data  mining  to  discover  known  benign 
traffic  that  they  excluded  from  the  data  transmitted  back  to  the  analysis  servers.11 


Malicious  Benign 


Unknown 


Known 


II 

I 

III 

IV 

Fig.  1  Network  traffic  composition 
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In  this  research,  we  propose  to  combine  expert  knowledge,  data  mining,  and  best  of 
breed  anomaly-based  NIDS  solutions  to  compute  a  maliciousness  factor.  We  then 
propose  to  feed  this  malicousness  factor  into  a  Kelly  criterion4-inspired  algorithm 
to  compute  the  amount  of  traffic  in  each  session  that  will  be  transmitted  to  the  CAS. 
This  should  produce  a  lossy  compression  of  the  network  traffic  designed  to  reduce 
the  amount  of  benign  traffic  and  maximize  the  amount  of  malicious  traffic  being 
sent  to  the  CAS. 

3.  Preliminary  Literature  Review 

This  research  is  broken  down  into  to  2  basic  questions:  1)  how  to  rate  the  malicious¬ 
ness  of  traffic  and  2)  how  to  use  this  rating  to  decide  how  much  of  each  session  to 
send  back  to  the  CAS.  We  will  answer  the  first  question  by  exploring  expert  knowl¬ 
edge,  data  mining,  and  anomaly  detection  solutions.  We  will  answer  the  second 
question  by  exploring  the  application  of  the  Kelly  criterion. 

3.1  Session  Rating 

3.1.1  Data  Mining 

Lee  and  Stolfo  used  RIPPER12  on  tcpdump13  data  in  their  paper,  “Data  Mining 
Approaches  for  Intrusion  Detection.”14  The  dataset  they  used  from  the  Information 
Exploration  Shootout15  contained  only  the  header  information  for  the  network  traf¬ 
fic  and  no  user  data.  Lee  and  Stolofo  cooked  the  network  traffic  down  into  records 
that  look  very  much  like  Cisco  netflow16  records.  Then  they  were  able  to  feed  this 
information  into  RIPPER  to  generate  rules.  Their  initial  efforts  were  unsuccess¬ 
ful;  however,  once  they  added  a  time  window  into  their  analysis,  they  were  able 
to  achieve  promising  results.  Since  their  data  only  contained  Internet  Protocol  (IP) 
header  information,  and  the  positions  of  the  exploits  were  not  available  to  them, 
they  were  not  able  to  assess  the  accuracy  of  their  results. 

While  developing  the  Intelligent  Intrusion  Detection  System  at  Mississippi  State 
University,  Bridges  et  al.  integrated  fuzzy  logic,  association  rules,  and  frequency 
episodes  data  mining  techniques  to  increase  the  flexibility  of  the  system.17  Genetic 
algorithms  were  employed  to  tune  the  membership  functions  of  the  fuzzy  logic.18 

Dokas  et  al.  addressed  the  problem  of  skewed  class  distribution  in  mining  data  for 
network  intrusion  detection  that  exists  because  malicious  activity  compromises  less 
than  2%  of  the  network  traffic  by  applying  several  boosting  strategies  to  classifica- 
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tion  algorithms  for  rare  classes  as  part  of  the  data  mining  in  Minnesota  Intrusion 
Detection  System  (MINDS).19 

In  ARL-TR-421 1,  Using  Basic  Data  Mining  Techniques  to  Improve  the  Efficiency  of 
Intrusion  Detection  Analysis,11  Long  and  Morgan  describe  mining  the  Interrogator 
database  to  discover  known  benign  traffic  to  be  excluded  from  the  traffic  transmitted 
to  the  CAS.  Their  strategy  was  to  exclude  the  most  common  day  to  day  traffic 
flowing  to  and  from  the  most  popular  trusted  sites.11 

3.1.2  Anomaly-Based  Network  Intrusion  Detection 

In  their  history  and  overview  of  intrusion  detection,  Kemmerer  and  Vigna  confirm 
a  long-standing  belief  that  although  anomaly  detection  techniques  are  capable  of 
detecting  unknown  attacks,  these  techniques  pay  for  that  capability  with  a  high  false 
positive  rate.7  In  traditional  NIDS,  high  false  positive  rates  drain  valuable  time  for 
the  analysts.  In  this  application,  false  positives  simply  increase  the  amount  of  traffic 
transmitted.  This  is  a  cost  to  be  considered;  however,  it  is  a  much  smaller  price  to 
pay  than  that  paid  by  generating  an  alert  for  someone  to  analyze.  This  means  that 
a  significantly  higher  false  positive  rate  can  be  tolerated  in  this  application,  making 
algorithms  that  would  be  unusable  for  detection  attractive  for  rating  the  likelihood 
that  traffic  is  malicious.  There  has  been  a  significant  amount  of  work  using  anomaly 
detection  in  NIDS  applications.  Garcia-Teodoro  et  al.  reviewed  various  types  of 
anomaly-based  detection  techniques,  categorizing  them  as  either  statistics  based, 
knowledge  based,  or  machine  learning  based.20 

In  1994  Mukherjee  et  al.  provided  a  survey  of  intrusion  detection  technology  titled 
“Network  Intrusion  Detection.”21  By  today’s  standards  the  title  is  somewhat  deceiv¬ 
ing  because  almost  all  of  the  systems  they  surveyed  are  what  would  now  be  called 
host-based  intrusion  detection  systems.  These  systems  tend  to  examine  the  indi¬ 
vidual  system’s  audit  logs  looking  for  intrusive  activity.  The  notable  exception  is 
Network  Security  Monitor  (NSM).  NSM  employs  a  System  Description  Language, 
which  is  roughly  modeled  after  a  programming  language  and  is  used  to  describe  the 
complex  relationship  that  may  be  inferred  from  observable  objects.  These  complex 
objects  are  analyzed  using  behavior-detection  functions.  NSM  implements  isolated 
object  analysis  and  integrated  object  analysis. 22-24 

Sekar  et  al.  describe  their  experiences  with  specification-based  intrusion  detection. 
They  created  a  behavioral  monitoring  specification  language  and  compiled  it  into 
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detection  engines,25  27  validating  their  approach  using  the  Defense  Advanced  Re¬ 
search  Project  Agency  (DARPA)  dataset.28 

Eskin  et  al.  describe  an  unsupervised  anomaly  detection  framework  where  network 
connections  are  mapped  to  a  feature  space  and  either  cluster-based,  k-nearest,  or 
support  vector  machine-based  algorithms  are  used  to  find  anomalies  in  the  sparse 
spaces.  One  of  the  key  advantages  to  their  approach  is  that  it  does  not  required 
labeled  or  known  normal  data  to  train  the  engine.29 

Kruegel  et  al.  developed  a  service- specific  anomaly  detection  engine.30  This  engine 
contained  a  packet  processing  unit  and  a  statistical  processing  unit.  The  packet  pro¬ 
cessing  unit  pulled  packets  from  the  network  and  reassembled  them  into  service 
requests.  The  statistical  processing  unit  measured  the  type  of  request,  length  of  re¬ 
quest,  and  content  of  the  request.  It  then  computed  values  that  ranged  from  1  to  15 
for  each  of  these  aspects,  such  that  greater  deviation  translated  into  higher  num¬ 
bers.  These  values  were  then  combined  to  provide  an  anomaly  score.  This  score 
was  compared  against  a  standard  that  the  author  suggested  should  be  set  so  that  the 
system  produces  no  more  than  15  false  positives  a  day.  Because  the  deviation  in 
type,  length,  and  content  varies  significantly  between  services  and  even  the  types  of 
requests,  the  statistical  data  must  be  partitioned  by  service  and  the  length  and  con¬ 
tent  by  type;  however,  the  algorithms  may  be  used  without  change  by  any  service. 
However,  the  packet  processing  unit  may  need  to  be  adjusted  per  service.30 

Ertoz  et  al.  describe  the  MINDS.31  33  MINDS  uses  Cisco  Netflow16  data  to  collect 
statics  for  16  different  features,  half  observed  and  half  computed  for  each  session. 
For  each  session  the  local  outlier  factor  is  computed.  Sessions  with  features  that 
contain  very  large  local  outlier  factors  are  considered  anomalous.  Sessions  then  un¬ 
dergo  associated  pattern  analysis,  which  provides  a  summary  of  highly  anomalous 
traffic  for  the  security  analyst.31 

Munz  et  al.  describe  anomaly  detection  using  K-means  clustering.34  Similar  to 
Mukherjee  et  al.,  they  separate  the  analysis  for  each  service  or  port.  Similar  to  Ertoz 
et  al.,  they  work  with  Cisco  Netflow  data.16  Unlike  the  solutions  mentioned  above, 
this  one  requires  both  normal  and  attack  training  data  to  establish  initial  clusters. 
New  traffic  is  then  compared  to  the  established  clusters.34 

Yassin  et  al.  describe  an  approach  that  combines  K-means  clustering  and  naive 
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Bayes  classification  called  KMC+NBC.  They  were  able  to  validate  their  algorithm 
against  the  ISCX  2012  Intrusion  Detection  Evaluation  Dataset35  with  strong  posi¬ 
tive  results.36 

3.2  Session  Selecting 

In  1956  while  working  for  Bell  Telephone  Laboratories,  Kelly  was  developing  a 
way  to  assign  a  value  measure  to  a  communication  channel.4  He  described  a  hypo¬ 
thetical  illustration  of  a  gambler  who  received  advanced  notice  about  the  outcome 
of  an  event  through  a  communication  channel  with  a  non-negligible  error  rate.  By 
doing  this,  Kelly  was  able  to  assign  a  cost  value  to  the  communication,  achieving 
his  original  goal.  At  the  same  time,  he  developed  a  formula  based  upon  the  proba¬ 
bility  of  winning  and  the  rate  of  payoff  that  would  provide  an  amount  to  bet  l  that, 
if  bet  consistently  over  time,  would  achieve  and  maintain  greater  wealth  than  any 
other  value  of  l.  We  saw  this  in  Eq.  1,  where  l  is  the  fraction  of  wealth  to  bet,  p  is 
the  probability  of  winning,  and  b  is  the  net  odds  of  the  wager.4 

j  _  bp-q  =  p(b  +  l)  -  1  ^ 

b  b 

Breiman  uses  the  Kelly’s  work  while  discussing  optimal  gambling  systems.37  He 
considers  the  problem  of  how  much  to  bet  on  a  series  of  biased  coin  tosses.  To 
maximize  returns  on  each  toss,  one  would  bet  his  or  her  entire  fortune;  however, 
this  will  ultimately  ensure  ruin.  To  maximize  winning  and  avoid  ruin,  some  fixed 
fraction  of  wealth  will  be  bet  at  each  iteration.  He  uses  Kelly’s  work  to  discover 
that  fixed  fraction.37 

Thorp  first  wrote  about  applying  mathematical  theory  to  the  game  of  Blackjack  in 
the  1960  paper  “Fortune’s  Formula:  The  Game  of  Blackjack”.38  Later  Thorp  pub¬ 
lished  the  book  Beat  the  Dealer ,  where  he  referred  to  what  he  called  “The  Kelly 
Gambling  System”.39  Although  he  mentions  using  the  Kelly  criterion  as  the  optimal 
way  to  bet  in  his  research  for  Beat  the  Dealer  in  his  later  work,40  he  mentions  it  only 
once  in  passing  in  this  book.39  The  bulk  of  his  book  discusses  the  rules  of  Blackjack 
and  methods  to  determine  when  one  has  an  advantage  over  the  dealer  and  how  great 
that  advantage  might  be.  The  Kelly  criterion  would  be  used  to  calculate  how  large 
of  a  bet  to  place  based  upon  the  size  of  the  advantage.  Instead  of  directly  using  the 
Kelly  criterion,  he  talks  about  placing  big  bets  and  little  bets.39  In  his  paper  “Under- 
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standing  the  Kelly  Criterion”,  Thorp  mentions  the  application  of  the  Kelly  criterion 
to  the  stock  market  and  his  previous  book  Beat  the  Market 40 ;  however,  the  Kelly 
criterion  is  not  mentioned  at  all  in  Beat  the  Market.  Instead,  Thorp  concentrates  on 
how  the  market  works,  what  short  selling  and  warrants  are  all  about,  and  how  to 
determine  the  relative  value  of  a  stock  or  a  warrant.41  Thorp  goes  into  greater  detail 
about  how  the  Kelly  criterion  would  be  used  in  Blackjack  and  the  stock  market  in 
his  paper  “Optimal  Gambling  Systems  for  Favorable  Games”.42  Thorp  goes  into 
even  greater  detail  in  his  later  work  “The  Kelly  Criterion  in  Blackjack,  Sports  Bet¬ 
ting,  and  the  Stock  Market”,  where  he  graphically  illustrates  how  the  log  of  wealth 
is  maximized  to  maximize  the  growth  of  wealth  over  time.43  He  specifically  applies 
the  criterion  to  the  stock  market  in  “The  Kelly  Criterion  and  the  Stock  Market”.44 

Nekrasov  created  a  formula  for  implementing  the  Kelly  criterion  in  multivariate 
portfolios,  as  seen  in  Eq.  2. 45  Consider  a  market  with  n  correlated  stocks  Sk  with 
stochastic  return  rk  and  a  riskless  bond  with  return  r.  An  investor  puts  a  fraction 
uk  of  his  capital  in  Sk  and  the  rest  is  invested  in  bonds.  The  following  formula  may 
be  used  to  compute  the  optimum  investments,  where  r  and  E  are  the  vector  of  the 
means  and  the  matrix  of  second  mixed  noncentral  moments  of  the  excess  returns.45 


w*  =  (1  +  r)(E)  1((r)  —  r). 


(2) 


4.  Problem  Definition 

The  scope  of  this  proposal  involves  developing  a  packet  capture  tool  that  will  in¬ 
telligently  select  portions  of  packets  in  an  effort  to  return  more  data  in  the  sessions 
most  likely  to  contain  malicious  data  and  less  data  in  sessions  most  likely  to  be 
benign,  while  effectively  using  a  limited  amount  of  bandwidth.  This  entails  devel¬ 
oping  an  algorithm  to  maximize  the  efficient  use  of  the  available  bandwidth  and 
selecting  and  refining  a  suite  of  algorithms  to  determine  the  likelihood  that  a  flow 
is  malicious. 

The  following  requirements  must  also  be  met  within  this  scope: 

•  While  exploring  sensor-based  packet  loss,  Smith  et  al.1  discuss  the  ways 
in  which  packet  loss  was  decreased  by  reducing  the  processing  load  of  the 
CPU.46  5(1  Since  there  is  a  direct  relationship  between  the  CPU  load  and  the 
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number  of  packets  dropped,  it  is  important  to  reduce  the  load  on  the  CPU  as 
much  as  practical. 

•  Since  the  sensor  will  be  using  the  same  network  that  the  monitored  site  uses 
for  daily  operation,  the  amount  of  data  transmitted  must  be  kept  relatively 
small;  10%  of  the  monitored  traffic  is  considered  a  reasonable  maximum. 

•  As  sensors  employing  this  technology  may  be  deployed  in  Department  of 
Defense  networks,  it  is  important  that  the  software  developed  complies  with 
the  joint  information  environment51  55  wherever  applicable. 

•  The  data  that  we  collect  will  need  to  be  analyzed  by  tools  contributed  by 
other  organizations;  therefore,  it  is  important  that  the  output  format  complies 
with  relevant  standards.  In  this  field  the  packet  capture  (PCAP)  data  format 
used  by  Tcpdump13  and  implemented  by  Libpcap56  is  the  de  facto  standard. 
The  packet  capture  tool  developed  must  support  those  standards  as  much  as 
practical. 

•  Since  the  tool  we  develop  will  implement  the  e-box  of  a  network  intrusion 
detection  system,  it  is  important  that  it  be  resistant  to  insertion,  evasion,  and 
denial  of  service  attacks  as  described  by  Ptacek  and  Newsham.57 

•  The  Kelly  criterion  assumes  that  one  is  able  to  bet  an  arbitrary  amount  of 
one’s  total  wealth;  however,  the  amount  of  data  that  may  be  transmitted  is 
limited  by  the  content  of  the  network  traffic.  It  will  be  necessary  to  account 
for  situations  where  the  amount  of  traffic  available  for  transmission  is  less 
than  the  amount  that  the  algorithm  indicates. 

•  The  Kelly  criterion  assumes  that  one  gets  paid  in  the  same  currency  with 
which  one  bets,  implying  that  winning  increases  the  wealth.  This  is  not  the 
case;  rather,  there  is  a  steady  income  that  does  not  increase  with  each  win.  It 
is  quite  possible  that  the  algorithm  may  need  to  be  significantly  adjusted  to 
account  for  this.  Kelly  himself  stated  that  the  formula  would  be  significantly 
different  if  the  gambler  were  to  be  on  a  fixed  budget  and  unable  to  reinvest 
his  winnings.4 

•  We  will  need  to  discover  the  optimal  collection  window.  Comparing  the  prob¬ 
lem  of  selecting  the  amount  of  traffic  to  transmit  to  the  CAS  to  how  much  to 
bet  on  horses  in  a  race  or  how  much  to  invest  in  stocks  in  a  portfolio,  we 
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can  see  that  the  problem  is  basically  given  a  set  of  choices  and  predicted 
outcomes  regarding  how  much  should  we  invest  in  each  choice.  In  the  horse 
racing  problem,  the  choices  are  limited  by  the  number  of  horses  in  the  race. 
In  the  portfolio  problem,  the  choices  are  limited  by  the  number  of  stocks  in 
the  portfolio.  In  this  case,  the  choices  are  limited  by  the  size  and  duration 
of  the  window  we  use.  If  the  window  is  small  and  short  enough,  then  the 
choices  will  be  limited  to  a  single  session.  If  the  window  is  large  and  long 
enough,  then  the  choices  could  expand  significantly.  The  size  and  duration  of 
the  window  also  affects  the  amount  of  information  that  may  be  included  in  the 
maliciousness  score.  If  the  window  is  very  short,  then  the  only  information 
that  we  may  be  able  to  access  is  the  source  and  target  IP  address  and  ports.  If 
the  window  is  longer,  we  may  be  able  to  include  a  test  of  the  entropy  of  the 
data  to  exclude  encrypted  traffic.  If  the  window  is  long  enough,  we  may  be 
able  to  include  the  results  of  static  NID  systems. 

5.  Objective 

The  objective  of  this  research  is  to  build  a  network  capture  tool  that  captures  mal- 
icous  traffic.  It  must  outperform  the  existing  Vsnap  network  capture  tool,  which  is 
the  successor  to  the  Snapper  network  capture  tool  described  by  Long.* 1 2 3  The  Vsnap 
tool  uses  a  dynamic  algorithm  with  limited  intelligence  for  choosing  which  packet 
and  how  much  of  each  packet  to  collect.  The  tool  we  develop  will  combine  expert 
knowlege,  data  mining,  and  best  of  breed  anomaly  detection  to  create  a  malicious¬ 
ness  score  for  each  session.  This  score  will  be  fed  into  a  Kelly  criterion-inspired 
algorithm  to  decide  how  much  of  each  session  to  transmit. 

6.  Research  Questions 

To  complete  our  objectives,  3  primary  research  questions  must  addressed: 


1.  What  is  the  optimal  way  to  combine  expert  knowledge,  data  mining,  and 
anomaly  detection  into  a  single  maliciousness  rating? 

2.  What  is  the  optimal  strategy  for  selecting  how  much  traffic  should  be  captured 
for  a  given  session  based  upon  its  maliciousness  rating? 

3.  What  is  the  optimal  window  of  selection? 
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(a)  How  long  is  the  average  network  session? 

(b)  What  is  the  tolerable  latency  between  collection  and  transmission  to  the 
CAS? 

(c)  How  computationally  expensive  may  the  algorithm  be  before  it  causes 
the  system  to  lose  packets? 

(d)  What  level  of  packet  loss  is  acceptable? 

7.  Approach 

This  research  effort  is  broken  down  into  3  research  questions  and  3  phases.  The 
first  question,  which  will  be  addressed  in  phase  1,  is  how  to  determine  what  traf¬ 
fic  is  most  likely  to  contain  malicious  activity.  The  second  question,  which  will  be 
addressed  in  phase  2,  is  how  to  select  the  traffic  most  likely  to  contain  malicious 
activity  for  transmission  to  the  analysis  servers.  The  third  research  question  and  its 
subquestions,  which  will  be  addressed  in  phase  1,  is  to  determine  the  optimal  win¬ 
dow  of  selection.  In  phase  3  the  prototype  developed  in  phase  2  will  be  incorporated 
into  the  Interrogator  NIDS  Architecture. 

7.1  Phase  1 

In  phase  1,  we  plan  to  combine  expert  knowledge,  data  mining,  and  best  of  breed 
intrusion  detection  to  compute  a  maliciousness  rating.  The  first  step  of  this  research 
will  be  to  discover  the  relevant  facts  that  may  be  gleaned  from  expert  knowledge. 
For  example,  when  the  Heart  Bleed  vulnerability  was  discovered,  an  expert  could 
have  caused  the  system  to  rate  secure  socket  layer  traffic  higher;  and  when  a  known 
malicious  IP  address  or  domain  is  discovered,  an  expert  could  cause  the  system  to 
rate  traffic,  including  that  IP  or  domain,  higher.  The  second  step  of  this  research  will 
be  to  discover  the  relevant  facts  that  may  be  mined  from  the  Interrogator  data  store. 
For  example,  Long  and  Morgan  mined  Interrogator  to  develop  a  white  list  of  web 
servers  to  be  excluded  and  instances  of  new  servers  to  be  included.11  This  could  be 
expanded  to  rate  traffic  more  malicious,  which  contains  addresses  and  ports  asso¬ 
ciated  with  alerts  or  incidents.  The  third  step  of  this  research  will  combine  best  of 
bread  anomaly  detection  algorithms  to  form  a  maliciousness  rating.  For  example, 
MINDS  collected,  computed,  and  assigned  a  local  outlier  factor  to  16  different  fea¬ 
tures. 31-33  KMC+NBC  uses  K-Means  clustering  and  Naive  Bayes  Classification  to 
detect  anomalies  in  network  traffic.36  Again,  a  measure  of  abnormality  could  factor 
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into  the  session  rating.  The  fourth  step  of  this  research  will  be  to  develop  a  formula 
to  combine  all  of  these  into  a  single  score.  Phase  1  corresponds  to  the  top  half  of 
Fig.  2,  where  unrated  sessions  are  captured  by  the  sensor  and  flow  into  the  session 
rater,  which  uses  expert  knowledge,  mined  data,  and  anomaly  algorithms  to  rate 
each  session.  The  green  sessions  are  known  benign,  the  red  sessions  are  known  ma¬ 
licious,  and  the  colors  in  between  are  meant  to  represent  the  continuum  in  between. 

7.2  Phase  2 _ 

In  phase  2,  we  plan  to  develop  a  Kelly  criterion-based  formula  that  takes  the  scores 
generated  from  phase  1  as  input  and  produces  as  output  a  fraction  of  the  available 
network  traffic  that  should  be  invested  in  each  session.  Kelly  proved  that  there  exists 
an  amount  to  bet  /  being  some  portion  of  the  total  wealth  G,  that  if  the  gambler  bets 
it  consistently,  G  will  obtain  and  maintain  a  level  greater  than  any  other  possible 
value  for  l.4  This  may  be  seen  in  Eq.  1,  where  /  is  the  fraction  of  wealth  to  bet,  p 
is  the  probability  of  winning,  and  b  is  the  net  odds  of  the  wager.  Thorp  applied  the 
Kelly  criterion  to  the  game  of  Blackjack.39  Smoczynski  and  Tomkins  applied  the 
Kelly  criterion  to  horse  racing.58  Separately,  Thorp  and  Nekrasov  applied  the  Kelly 
criterion  to  the  stock  market.41'45  Using  this  generalization,  one  would  consider 
network  flows  to  be  stocks  and  rate  of  return  to  be  the  maliciousness  score  of  the 
session.  Phase  2  corresponds  to  the  bottom  half  of  Fig.  2,  where  the  rated  sessions 
flow  into  the  algorithm  and  the  session  selector  feeds  those  ratings  into  the  Kelly 
criterion4-inspired  formula  to  determine  how  much  traffic  to  invest  in  each  session. 
The  fatter  sessions  represent  more  traffic  being  invested  in  the  session,  and  the 
skinnier  sessions  represent  less  traffic  being  invested  in  the  session. 

We  will  use  Nekrasov’s  formula  in  Eq.  2  to  illustrate  how  this  might  work.  To 
apply  this  to  our  problem,  we  will  substitute  the  returns  for  the  maliciousness  score 
and  the  investment  for  the  amount  of  available  traffic  to  assign  to  each  session. 
Since  a  riskless  bond  makes  no  sense  in  our  problem,  we  will  set  the  value  to  zero, 
simplifying  the  equation  as  in  Eq.  3.  This  leaves  us  with  only  one  variable  because 
the  second  noncentral  moment  is  a  function  of  the  maliciousness  rating  over  time. 
Remember,  it  is  unlikely  that  Nekrasov’s  formula  will  work  as  given.  We  need  to 
start  from  the  same  starting  point  that  Kelly  did  to  retrace  his  steps  to  construct  a 
formula  for  this  specific  application. 
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u* *  =  (E)-X((V)).  (3) 

Once  the  session  rater  and  session  selector  algorithms  are  developed,  they  will  be 
incorporated  into  a  prototype  that  will  be  tested  against  open  source  datasets  to 
include  those  used  by  Smith  et  al.  in  their  theoretical  exploration1  and  data  col¬ 
lected  by  the  Army  Research  Laboratory  (ARL)  Computer  Network  Defense  Ser¬ 
vice  Provider  (CNDSP). 

7.3  Phase  3 

In  phase  3,  the  prototype  developed  in  phase  2  will  be  developed  into  a  produc¬ 
tion  application  that  addresses  all  of  the  requirements  from  Section  4  and  may  be 
incorporated  into  the  Interrogator  NIDS  Architecture.3 

8.  Milestones 

Milestones  for  phase  1  include  the  following: 

•  Produce  a  validated  list  of  facts  that  may  be  discovered  through  expert  knowl¬ 
edge. 
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•  Select  and  evaluate  several  techniques  for  mining  the  Interrogator  data  store. 

•  Produce  a  validated  list  of  facts  that  may  be  discerned  by  mining  the  Inter¬ 
rogator  data  store. 

•  Select  and  evaluate  several  anomaly  detection  algorithms  against  open  source 
datasets  and  data  collected  from  the  ARL  CNDSP. 

•  Produce  a  validated  list  of  facts  that  may  be  discerned  by  executing  these 
anomaly  detection  algorithms  against  network  traffic. 

•  Combine  all  of  these  into  a  maliciousness  rating. 

Milestones  for  phase  2  include  the  following: 

•  Develop  a  Kelly  criterion-inspired  formula  that  takes  as  input  the  facts  dis¬ 
covered  in  phase  1 . 

•  Develop  a  prototype  network  compression  tool  that  uses  our  collection  of 
facts  and  our  modified  Kelly  criterion  to  produce  compressed  network  traffic. 

•  Assess  this  prototype  based  upon  its  ability  to  compress  the  traffic  and  the 
amount  of  malicious  activity  in  the  original  data  but  not  in  the  compressed 
data. 

Milestones  for  phase  3  include  the  following: 

•  Incorporate  prototype  from  phase  2  into  the  Interrogator  Network  Intrusion 
Detection  Architecture.3 

•  Evaluate  its  performance  in  a  relevant  environment. 

9.  Expected  Results 

We  have  access  to  several  public  datasets  of  network  traffic.  We  also  have  access 
to  the  Vsnap  network  capture  tool,  which  is  the  successor  of  the  Snapper  network 
capture  tool  describe  by  Long.3  We  will  be  able  to  process  our  datasets  with  both 
Vsnap  and  the  Kelly  compressor  and  compare  which  captures  more  malicious  traf¬ 
fic.  We  will  also  be  able  to  measure  the  amount  of  benign  traffic  each  tool  captures 
and  compare  their  relative  densities. 
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10.  Conclusion 


In  a  distributed  NIDS  environment,  it  is  necessary  to  transmit  the  right  data  back 
to  the  central  analysis  servers  to  provide  analysts  with  the  information  necessary 
to  detect  and  report  malicious  activity.  Bringing  back  all  of  the  data  would  double 
the  bandwidth  requirements  of  the  site  and  require  that  the  analysis  servers  have 
massive  bandwidth  available  to  receive  it  all.  Standard  lossless  compression  is  not 
sufficient  to  reduce  this  traffic  to  an  acceptable  level.  The  goal  of  this  research  is 
to  develop  a  lossy  compression  algorithm  that  will  ensure  that  the  traffic  lost  is  the 
least  likely  to  contain  malicious  activity.  The  approach  is  to  use  an  algorithm  based 
upon  the  Kelly  criterion  to  allocate  the  limited  bandwidth  available,  coupled  with 
best  of  breed  anomaly  detection,  to  assess  the  maliciousness  of  the  traffic.  These 
2  technologies  will  be  combined  into  a  packet  capture  tool  that  will  produce  data 
compliant  with  the  standards  used  by  existing  NIDS  tools. 
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List  of  Symbols,  Abbreviations,  and  Acronyms 


ACRONYMS: 

ARL  Army  Research  Laboratory 
CAS  central  analysis  servers 
CNDSP  computer  network  defense  service  provider 
CPU  central  processing  unit 
IP  internet  protocol 

MINDS  Minnesota  INtrusion  Detection  System 
NIDS  network  intrusion  detection  system 
NSM  Network  Security  Monitor 
PCAP  packet  capture 
MATHEMATICAL  SYMBOLS: 

l  the  amount  to  bet  from  the  Kelly  criterion 
G  total  wealth  from  the  Kelly  criterion 
p  the  probability  of  winning 
b  the  net  odds  of  the  wager 
Sk  the  stochastic  return 
rk  return  of  a  riskless  bond  with  return  r 
uk  a  fraction  uk  of  capital 
r  the  vector  of  the  means  T 

S  the  matrix  of  second  mixed  noncentral  moments  of  the  excess  returns 
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